Main Findings

Executive Summary

COVID-19 data from United Kingdom and 27 European countries have been analysed to compare the number of confirmed cases and deaths between United Kingdom and EU countries for time period from 22nd of January, 2020 to 18th of January, 2021.

The total number of confirmed cases as well as deaths shows that the United Kingdom has the highest number with France and Italy as the next two countries, from all of 28 countries. However, in order for the comparison to be more reliable, the size of the population of individual countries should be taken into account.

The coefficient calculated for this purpose as number of confirmed cases per 100,000 people showed, that the Czechia, Luxembourg and Slovenia are at the top of the list, while the United Kingdom is in two-thirds of it. The countries with the lowest confirmed cases rate are Finland, Greece and Germany.

With a death rate calculated as the number of deaths per 100,000 inhabitants, the country with the highest ratio is Belgium, then Slovenia and Italy. The United Kingdom ranks fifth, just behind Czechia. Finland has the lowest rate again, followed by Cyprus and Estonia.

Data summary

Summary of data for the United Kingdom

The first period of significant increases for the number of new confirmed cases occurred between March and June, with a peak of 5,490 cases dated the latter part of April. The second significant increase occurred between early October and January, peaking at 68,053 cases on January 8, 2021. In the case of virus-related deaths, a period can be distinguished between mid-March and June, when the number of new deaths increased significantly until April, reaching 1,224 and then dropping down to near zero. The second period of noticeable increases began in October, with a slight decline in mid-December, reaching the highest values in mid-January 2021.

Figure 1: Daily number of COVID-19 confirmed cases and deaths on timeline for United Kingdom

In both cases, two periods of growth can be seen, with a significant decline in value in between. The reason for the decrease in the daily number of confirmed cases and deaths is the decision of the authorities to introduce significant restrictions on the movement of residents and limit the operation of some companies, especially related to on-site customer service.

The period of relaxation of the restrictions, as well as a new strain of the virus that spreads faster, are likely to cause the second phase of growth to be much higher than the first. The noticeable decline around mid-January 2021 may be related to new restrictions introduced by the Government in the United Kingdom.

Summary of data for all European countries

Each of the twenty-eight countries experienced an increase in the total number of cases of the virus in the first quarter of 2020. In most countries, the number of cases flattened in the middle of the year, then increased significantly by the end of 2020. The same was true for the total number of deaths.

Figure 2: COVID-19 confirmed cases counted cumulatively in UK and EU countries

Figure 3: COVID-19 deaths counted cumulatively in UK and EU countries

Comparing the total number of deaths and confirmed cases caused by the virus for all European countries, Great Britain has the highest values in both cases.

Figure 4: Total number of COVID-19 Confirmed Cases for UK and EU countries

Figure 5: Total number of COVID-19 Deaths for UK and EU countries

Bearing in mind the fairness of the comparison, the country’s population should be taken into account as an additional factor. For this purpose, ratios of deaths and confirmed cases per 100,000 inhabitants have been calculated.

The Czech Republic, Luxembourg and Slovenia have the highest ratio of COVID-19 cases. The United Kingdom is in one third of the list, while the last three places belong to Germany, Greece and Finland.

Figure 6: COVID-19 Confirmed Cases Ratio in UK and EU countries

Figure 7: COVID-19 confirmed cases Ratio in UK and EU countries - map

The countries with the highest death ratio caused by virus are Belgium, Slovenia and Italy. Whereas Finland, Cyprus and Estonia have the smallest level of this ratio. Great Britain is in fifth place.

Figure 8: COVID-19 Deaths Ratio in UK and EU countries

Figure 9: COVID-19 deaths Ratio in UK and EU countries - map

A table summarizing the minimum, average and maximum values for the number of confirmed cases, deaths and their ratio has been prepared below.

Table 1: Summary table with minimum, average and maximum values for covid_eu_uk_selected_merged_latest_day dataset
Type Confirmed_cases_cumulatively Deaths_cumulatively Confirmed_cases_Ratio Deaths_Ratio
Minimum 15742.0 175.00 734.700 11.21000
Average 760652.2 18411.75 4359.622 86.95071
Maximum 3433494.0 89860.00 8405.720 179.60000

Confirmed cases ratio vs population density

It seems interesting to investigate whether the countries with the highest ratio of confirmed cases are also the most densely populated. To compare these data, the following grouped bar chart has been prepared, with the values sorted by the size of the indicator of confirmed COVID-19 cases.

Figure 10: COVID-19 Confirmed cases Ratio vs Population density

The country with the highest population density is Malta. However, in terms of confirmed cases ratio, it is in the twentieth place out of twenty-eight countries. Finland has the lowest proportion of confirmed cases per 100,000 people and its population density is also the lowest.

Taking into account other countries, such as the Netherlands, Sweden, Lithuania or Slovenia, it seems clear that it is impossible to directly associate a higher population density with a higher confirmed COVID-19 cases ratio.

Table 2: The list of country’s population density and COVID-19 confirmed cases ratio, sorted from highest value to lowest
Country Pop_density_[persons_per_km2] Conf_cases_Ratio
Malta 1548.3 3309.22
Netherlands 504.0 5339.06
Belgium 375.3 5963.64
United Kingdom 273.8 5180.79
Luxembourg 235.1 8121.20
Germany 234.7 2487.41
Italy 202.9 3951.63
Denmark 138.0 3284.70
Czechia 137.7 8405.72
Poland 123.6 3788.94
Portugal 113.0 5407.65
Slovakia 111.8 4122.36
Austria 107.1 4476.62
Hungary 107.1 3606.97
France 105.6 4345.12
Slovenia 102.9 7229.45
Cyprus 94.4 3370.61
Spain 93.1 5007.56
Romania 83.1 3559.30
Greece 82.5 1386.49
Croatia 73.2 5483.58
Ireland 70.9 3619.64
Bulgaria 63.9 3012.51
Lithuania 44.7 5980.70
Estonia 30.4 2830.95
Latvia 30.4 2890.23
Sweden 25.0 5172.66
Finland 18.1 734.70
Country Conf_cases_Ratio Pop_density_[persons_per_km2]
Czechia 8405.72 137.7
Luxembourg 8121.20 235.1
Slovenia 7229.45 102.9
Lithuania 5980.70 44.7
Belgium 5963.64 375.3
Croatia 5483.58 73.2
Portugal 5407.65 113.0
Netherlands 5339.06 504.0
United Kingdom 5180.79 273.8
Sweden 5172.66 25.0
Spain 5007.56 93.1
Austria 4476.62 107.1
France 4345.12 105.6
Slovakia 4122.36 111.8
Italy 3951.63 202.9
Poland 3788.94 123.6
Ireland 3619.64 70.9
Hungary 3606.97 107.1
Romania 3559.30 83.1
Cyprus 3370.61 94.4
Malta 3309.22 1548.3
Denmark 3284.70 138.0
Bulgaria 3012.51 63.9
Latvia 2890.23 30.4
Estonia 2830.95 30.4
Germany 2487.41 234.7
Greece 1386.49 82.5
Finland 734.70 18.1

Conclusions

All countries have experienced two major periods of increase in COVID-19 cases and deaths over a similar period of time. It would be worth to investigate the reasons for their ups and downs, looking for data on country-specific restrictions and other factors, such as a new virus variant. However, it should be clearly stated that the data on them were not the subject of this report.

No evidence was found to support the idea that countries with the highest population density have the highest virus infection rate.

Data preparation

1. Additional datasets

Dataset containing country codes

The source of the dataset with country codes

This dataset was retrieved from the United Nations Statistics Division and has been used to obtain the names of the countries to add them to other two additional datasets containing data on the population and population density. For this purpose, two columns containing country names and country codes have been filtered and saved.

Datasets containing data about population and population density

The source of the dataset with population of European countries

The source of the dataset with population density of European countries

These datasets were retrived from Eurostat, the statistical office of the European Union. The first dataset has been used to obtain population of countries in Europe. Some numerical data has been marked by p, which stood for provisional. This has been cleaned and formatted. The population density dataset contained the most recent data for 2018, therefore this year is included for the population data. The first columns of both datasets contained strings. They were used to remove redundant provincial data and extract country codes from them.

The population and population density data were combined into a single dataset containing only 28 countries (UK + EU countries), to which the country names from the country code dataset were added.

As it turned out, the created dataset did not contain the names of two countries. The reason was the difference between the methodology used by European Union and UNSD in the country codes. The United Kingdom has the UNSD country code GB and the European Union country code UK. Likewise Greece: GR according to UNSD and EL for the European Union. The names of these countries have been added.

Table 3: Missing values in eu_uk_pop_and_pop_density_merged dataset
Country Country_Code Population_2018 Pop_density_[persons_per_km2]
NA EL 10741165 82.5
NA UK 66273576 273.8

2. Datasets containing COVID-19 confirmed cases and deaths

The source of the dataset with COVID-19 confirmed cases

The source of the dataset with COVID-19 deaths Europe

Both datasets contained global data of COVID-19 confirmed cases/deaths counted cumulatively. They have been downloaded as a RAW datasets in .csv format from Kaggle website and are described as the updating version of COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU).

They have an identical structure, having columns with country/region names, province/state names, latitude and longitude of countries and data about COVID-19 stored in 363 columns, every each for a day from 22/01/2020 to 18/01/2021.

They have been formatted and filtered to obtain values for United Kingdom and 27 European countries in a long format of data.

3. Combining prepared datasets into one

The data of COVID-19 confirmed cases and deaths have been merged with population and population density data. Two new columns containing the ratio of confirmed COVID-19 cases and deaths per one hundred thousand inhabitants have been calculated and added.

Due to the fact that the COVID-19 data contained in this dataset was counted cumulatively for each day, it was necessary to filter the data for the last available day.

Table 4: Previev of covid_eu_uk_selected_merged_latest_day dataset
Date Country Lat Long Population_2018 Pop_density_[persons_per_km2] covid_conf_cases_cum covid_deaths_cum Conf_cases_Ratio Deaths_Ratio
2021-01-18 Austria 47.5162 14.550100 8822267 107.1 394939 7122 4476.62 80.73
2021-01-18 Belgium 50.8333 4.469936 11398589 375.3 679771 20472 5963.64 179.60
2021-01-18 Bulgaria 42.7339 25.485800 7050034 63.9 212383 8565 3012.51 121.49
2021-01-18 Croatia 45.1000 15.200000 4105493 73.2 225128 4655 5483.58 113.38
2021-01-18 Cyprus 35.1264 33.429900 864236 94.4 29130 175 3370.61 20.25
2021-01-18 Czechia 49.8175 15.473000 10610055 137.7 891852 14449 8405.72 136.18

To present the situation in the United Kingdom, data has been filtered and modified by adding two new columns with calculated number of new confirmed cases/deaths for each day.

References

Division, The United Nations Statistics. 2021. “Standard Country or Area Codes for Statistical Use (M49).” 2021. https://unstats.un.org/unsd/methodology/m49/overview.

Eurostat. 2021a. “Population Change - Demographic Balance and Crude Rates at Regional Level (Nuts 3).” 2021. https://ec.europa.eu/eurostat/databrowser/view/DEMO_R_GIND3__custom_521298/default/table?lang=en.

Goldbloom, Anthony. 2021a. “COVID-19 Data from John Hopkins University.” 2021. https://www.kaggle.com/antgoldbloom/covid19-data-from-john-hopkins-university?select=RAW_global_deaths.csv.

———. 2021b. “COVID-19 Data from John Hopkins University.” 2021. https://www.kaggle.com/antgoldbloom/covid19-data-from-john-hopkins-university?select=RAW_global_confirmed_cases.csv.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.

Hao Zhu, Timothy Tsai, Thomas Travison. 2020. 2020. http://haozhu233.github.io/kableExtra/.

Inc., Plotly Technologies. 2015. “Collaborative Data Science.” Montreal, QC: Plotly Technologies Inc. 2015. https://plot.ly.

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software, Articles 59 (10): 1–23. https://doi.org/10.18637/jss.v059.i10.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.